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1 Introduction 



Weighted pushdown automata (WPDAs) have recently been adopted in some applications such as machine 
translation flglesias et al. ZOllj as a more compact alternative to weighted finite-state automata (WFSAs) 
for representing a weighted set of strings. [AUauzen and Riley | |2012J introduce a set of basic algorithms for 
construction and inference of WPDAs, and the corresponding implementation as an extension of the open 
source finite-state transducer toolkit OpenFstj^ 



Although a shortest-path algorithm for WPDAs with bounded stack is described in AUauzen and Riley 
[ 2012[ , it does not give a A:-shortest-path algorithm, which finds the k shortest accepting paths of the given 
automaton. Other than just the single shortest path, k shortest paths are useful for many purposes such 
as reranking the output in parsing f CoUins and Koo) 2005| or tuning feature weights in machine translation 
[Chiang et al. 2009]. One existing work-aroimd is to first expand the WPDA into an equivalent WFSA and then 
find the k shortest paths of the WFSA using the fc-shortest-path algorithm for WFSAs (the expansion approach). 
Since the WPDA expansion has an exponential time and space complexity with respect to the size of the 
automaton, one usually has to prune the WPDA before expansion (the pruned expansion approach), i.e. remove 
those transitions and states that are not on any accepting path with a weight at most a given threshold greater 
than the shortest distance. However, setting an adequate threshold that neither prunes nor keeps too many 
states or transitions a priori is almost impossible in practice. 

In this paper, we introduce two efficient algorithms for finding the k shortest paths of a WPDA, both 
derived from the same weighted deductive logic description of the execution of a WPDA using different search 
strategies. 



2 Weighted pushdown automata 

2.1 Formal definitions 



Following AUauzen and Riley [ |2012| , we represent a WPDA as directed graph with labeled and weighted arcs 
(transitions). 

Definition 1. A WPDA M over a semiring (K, ®, 0,0, 1) is a tuple (L, 11, H, Q, E, s,/), where 



^ http : / / www. openfst.org/twiki/bin/ view / FST / FstExtensions 
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Figure 1: A WPDA of {a"b"\n > 0} 



• S, n and n are disjoint finite sets of symbols; 

• Tj is the alphabet of input symbols; 

• n and n are the alphabets of respectively opening and closing parentheses; there exists a bijection between them 
that pairs the parentheses; for any a ^Ylutl, we represent its counterpart in the other alphabet as a; 

• Q is a finite set of states; s e Q is the start state and / e Q fs the final state; 

*E C Q X {T, U n L) tl U {e}) X K X Q is a finite set of transitions; e = {p[e],i[e],w[e],n[e]) e E denotes a 
transition from state p[e] to state n[e] with label i[e] and weight w[e], where zv[e] ^ 0. 

A path 7T is a sequence of transitions n = eie2 ■ ■ ■ e„i, such that n[e,] = p[e,+i] for all 1 < i < m. p[-], i[-], 
w[-] and n[-] can all be generalized to paths. For a given path n = eie2 ■ ■ ■ e^, define p[n] = p[ei], n[Tz] = n[e„,], 
i[n] = i[e-[]i[e2\ ■ ■ ■ i[em\, and w[n\ = w[ei\ w[e2\ ... §5 w[em]- Unlike a WFSA, not all paths from s to / in a 
WPDA are accepting paths. For a set of symbols S, let cg[Ti\ be the substring of i[Tc\ consisting of all and only 
the symbols from set S. For example, C]-[yfj[7T] is the substring of /[tt] consisting of all and only the opening 
and closing parentheses. Then, 

Definition 2. The Dyck language on finite parenthesis alphabets Tl and ft consists of strings of balanced parentheses. A 
path n is balanced if c^iyj^il'^] belongs to the Dyck language on H and tl. 

For example, when n = {'('/'['} and H = {')'/']'} with normal pairing by appearance, strings such as (), 
([()])[] members of the Dyck language while ( or (] [) are not. 
Finally 

Definition 3. A path n is an accepting path if and only ifp[Tc] = s, n[Tz] = f and n is balanced. 

This representation of WPDAs is slightly different from the classical representation of PDAs, where a 
stack alphabet is defined with optional push or pop operations at each transition. Here the stack alphabet 
is essentially Tl and Tl, paired by the bijection between them. Whenever a symbol from Tl is consumed, it 
is equivalent to pushing the particular symbol onto the stack in the classical representation; and whenever a 
symbol from Tl is consumed, it is equivalent to popping a symbol off the stack and checking if the symbol is its 



counterpart from Tl. As discussed in ^AUauzen and Riley 1 2012 1, such representation leads to easy adaptation 



of some WFSA algorithms for similar purposes on a WPDA. 

Following AUauzen and RileyJ | 2012| , we limit our effort in finding k shortest paths to WPDAs with a 



bounded stack in both pushing and popping]^ 

Definition 4. A ]NPDA has a bounded stack if there exists an integer K such that for any path n, the number of 
unmatched parenthesis in cn [tc] is no greater than K. 

Although this rules out all WPDAs with recursion, the ones found in applications that need to find the 



k shortest paths usually do not have recursion |Iglesias et al. 2011). Thus an algorithm that only works on 



WPDAs with a bounded stack is already very useful. 

[Allauzen and Riley [ .2012J give a general algorithm for converting a context free grammar into an equivalent 



WPDA. Figure 1 is an example WPDA representing the classical context free language {a'^b" | n > 0} constructed 



This definition is slightly different from Allauzen and Riley 2012 , which only bounds pushing. 
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Figure 2: Input string "aabb" encoded as a WFSA 




Figure 3: The result of intersection 



following their algorithm. It is easy to see this WPDA does not have a bounded stack. However, considering 
this as a "grammar", one can then "parse" strings with the grammar by encoding the input as a WFSA and 
intersecting the WPDA with it. For example, Figure|3]is the result of intersecting Figure|2]with Figurejlj which 
now has a bounded stack. 



2.2 Automata execution as weighted deduction 

A deductive logic defines a space of weighted items, some of which are axioms or goals (items to prove), and 
a set of inference rules of the form, 

Ai:Wi A2.W2 ... A m ■■ Wm ^ 



which means if items Ai, A2, ■ ■ ■ , A,,, are provable respectively with weights Wi, W2, ■ ■ ■ , zvm, then item B is also 
provable with weight g{wi,W2, ■ ■ ■ ,Wm) given the side condition (p is satisfied. We also call B proved this way 
an instantiation of B with weight g{wi,W2, ■ ■ ■ ,zv„i). This style of system has been commonly used to express 



parsing strategies since Shieber et al. 1 1995j . 



The execution of a WPDA M can be described using the following weighted deductive logic Cm- 

• The items are of the form (^i ^ q2> where (^1,(^2 G Q- An instantiation qi q2 u for some m G K 
intuitively means there is a balanced path from qi to q2 with weight u. 

• Axioms are 

— q = s or there exists e e E such that n[e\ = q and i[e\ S IT 

q q : \ 

Furthermore, we call any state q an entering state it q q : 1 is an axiom . 

• There are two inference rules, 

1. Scan 

q ^ p[e] : u 



q n[e] : u <Si w[e] 

2. Complete 

tj^p[ei]:Mi n[ei] p[e2] : M2 
q ^ n[e2] : mi ® iv[ei] ® U2 ®w[e2] 



e eE such that i[e] e S U {e} 
61,62 G E such that i[ei] E Tl,i[e2\ E tl,i[e-[] = i[e2] 
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qi qi 



—^q2^qi:l{l) — 

qi-^ qi:l qi"^ q3 



1 

Yqi^qi-'^ (2) 

Yqi^q^-^ (3) 

( - ) - 

— qi^qi-l (4), q^^qi-.l (5) 



T T qi^ q3'^ (6) 

?! ^ ?! : 1 ?2 ^ ?3 : 1 ( _ ) 

^^^Cji-.l ^1 ^ ^2 : 1 (7), ^ : 1 (8) 

Figure 4: A proof of the accepting path of "aabb" 

• The only goal item is 

s ^ / 

Any valid proof forms a tree that induces a path. The induced path can be obtained by reading off the 
transitions in side conditions through a left-to-right post-order traversal of the proof tree. Take the WPDA in 
Figure [T] for example; Figure |4] is a proof of the accepting path of the string "aabb". The accepting path is thus 

(7) (1) (4) (2) (3) (5) (6) (8), i.e. qi ^ qi ^ qi ^ qi ^ qi ^ q^ ^ qi ^ q^ ^ qi- 

One can easily prove the following by induction for any WPDA M (see the appendix)]^ 

Theorem 1 (Soundness). Any valid proof of an instantiation qi^^ q2 ■ u in Cm induces a balanced path from qi to qz 
with weight u in M. 

Theorem 2 (Completeness). Any balanced path from an entering state qi to some state q2 with weight u in M has a 
valid proof of an instantiation qi^ q2 : u in whose induced path is that path. 

Theorem 3 (In-ambiguity). Any balanced path from an entering state in M has a unique proof in £mQ 

The three properties together essentially state that there is a one-to-one correspondence between proofs of 
goal items in Cm and accepting paths in M. 

2.3 The fc-shortest-path problem 

The A:-shortest-path problem on a WPDA M with a bounded stack is to find k accepting paths from M with the 
smallest weights with respect to the natural ordering of M's weight semiring 
The natural ordering < C K x K is defined as 

Definition 5. For any a,b & K, a < b if and only if a®b = a. 

For the problem to be well-defined, the natural ordering also has to be total, which is equivalent to requiring 
the ® operator to have the following path property: for any a,b & ¥i, a (Bb = a or a® b = b. An example meeting 
these conditions is the tropical semiring (R U {oo}, min, +, oo, 0), one of the most commonly used as weights in 
parsing and machine translation. Its natural ordering is simply the ordering of real numbers and infinity. 



3 Computing the Shortest Distance 

One of the benefits of the above weighted deduction representation is that many properties can be computed 
by carrying out the deductions in a uniform style. As a starting point, we are interested in finding the smallest- 
weight instantiation of some item qi-^ q2- For reasons which will become clear later, we call the weight of that 

'Note especially that a bounded stack is not reqmred. 
*Up to the tree structure with side conditions. 

^In the rest of this paper, we always assume the WPDA M has a bounded stack. 
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instantiation the inside weight of q-i q2- Let R be the set of all instantiations of provable items. Because of the 
path property, computing the inside weight of qi q2 is equivalent to computing 

Oi{qi -^qi) = M 

{h|(Ji'^(J2:m£R} 



The sum can be further grouped by the last step taken in a proof of q^ qz ■ u. Define A{q-[ ^ q2) to be 
the following, 

1 qi ^ qi is an axiom 
otherwise 



^ q2) 



Define Sq^-^q^ ^ E be the set of "last steps taken" to prove qi ^ q2 with a Scan, i.e. e is in Sq^^q^ if and only 
if some instantiation qi ^ p[e] : u with e as the side condition can prove q-i ^ q2 with the Scan rule. Similarly, 
define Cq^^q^ C E x E be the set of "last steps taken" to prove qi q2 with a Complete, i.e. (ei, £2) is in Cq^-^q^ 
if and only if some instantiations qi p[£i] '■ and n[ei] ^ pi^i] '■ "2 can prove qi ^ q2 with the Complete 
rule. Then, Ci{qi ^ q2) can be rewritten as 




Oi{qi-^ q2) =A{qi^ q2)® \ u®w[e\ 

ui®w[ei]®U2®w[e2] 

A{q^^q2)®\ (^{qi^p[e\)®w[e\\® 




a{qi ^ p[ei]) ® w[ei] ® a(n[ei] p[e2]) ® w[e2] 

,(ei,e2)6C,j-„,2 

This recursive formulation allows us to compute the shortest distance of an item using the shortest distance 
of its component sub-items. When the WPDA M has a bounded stack, one can easily derive an algorithm that 
computes the shortest distance using Cyi. Figure |5] is a simple example of such an algorithm. This algorithm 
carries out a standard agenda-based reasoning with the relaxation technique ]Cormen et al. 2009| , where Q 



is the agenda. The map a maintains the current estimate of each proven item's inside weight. Lines 4-7 seed 
the axioms as the starting point of reasoning. Then lines 8-26 try to prove new items by applying the Scan 
rule (lines 12-13) and the Complete rule (lines 14-24). Any item that is newly proven or proven with a smaller 
weight is added back to the agenda in the Relax function. 

The above algorithm is conveniently derived from the weighted deduction system using standard tech- 
niques. Nevertheless, there are other strategies that can also be used; for example, the shortest path algorithm 
in Allauzen and Riley 1 2012| is essentially computing the inside weights with a multi-agenda strategy. 



4 Algorithm 1 



Having discussed the shortest distance problem in a WPDA, we now move on to the fc-shortest-path problem. 
The key idea of our first algorithm is similar to the A* fc-best parsing algorithm in Pauls and Klein | |2009| . As 
we have shown in Section 2.2 similar to parsing, the execution of a WPDA can be described as a weighted 



deductive logic. The generalized A* search algorithm from Felzenszwalb and McAllesterJ2007| can then be 



applied with a monotonic and admissible heuristic function to find the k instantiations of the goal item with 
smallest weights, from which we get the k shortest paths. The outside weight of items can be defined with 
similar meanings to parsing and used as an exact heuristic. Another, inexact heuristic will also be discussed, 
which will eventually lead to our second algorithm. 
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function Inside 




PC i — empty map 




Q i — empty c][ueue 




for all entering state cj do 


5: 


Push(c3 fl O) 








end for 




while O is not emntv do 




fli — > ^ — P0t7^O^ 


10: 


U i — Oi[cii ^> ^2] 




for all transition e such that p [e\ = Cj2 do 




if i[e] e E U {e} then t> Scan 








else if i\p\ G Y\ then \> C^omnlete* as the left antecedent 


15: 


for all surh that / [p'l = zfp] c\t\c\ h\p\ i^fp'l is in a do 

A V^X vl-XX l7 LX^X L LX LiX L r f- CLX LvX 1 1 J-/ 1 1 XL7 XX L t-V V4\_F 








end for 




else if i[e\ e tl then > Complete; as the right antecedent 




for all e' such that i[e'] = i[e] and n[e'] = qi do 


20: 


for all such that ^ vie'] is in a do 








end for 




end for 




end if 


25: 


end for 




end while 




end function 




function Kelax{q\ q2i ^) 


30: 


if Q-\ is in ci then 

XX *i \ ^ A ^x 1 v-x 1 




u <— oc[qi ^ q2]®'w 




if u a[qi q2] then 




a[qi ^ q2] u 




Push{qi ^ q2, Q) if cji ^ Cj2 not already in Q 


35: 


end if 




else 




Oi[qi q2] u 




Push{qi ^ q2, Q) if qi q2 not already in Q 




end if 


40: end function 



Figiire 5: A simple Inside algorithm 
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S <— empty set of proven instantiations 




Q <— empty min-priority queue 




for all axiom A : m do 




Push{A : u, Q) with priority H{A : u) 


5: 


end for 




while Q is not empty do 




A:u^ PopiQ) 




if A : M is a goal item then 




Output A : u 


10: 


end if 




Add A : M to S 




for all new instantiation B : v proveable using A : u and any member of S do 




Push{B : V, Q) with priority H(B : v') 




end for 


15: 


end while 



Figure 6: The generalized A* algorithm 



4.1 A* search on a deductive logic 

Felzenszwalb and McAllester 1 2007| introduce the generalized A* search algorithm on a deductive logic. Al- 



though the original algorithm assumes the weights are from a positive tropical semiring, this is not a necessary 
requirement in our problem, as we show next. 

Similar to the original A* algorithm on graphs IHart et al. 19681, we need a heuristic function H to estimate 



the final weight continuing from the current search state (an instantiation in this case) to the closest goal item. 
More formally, for a weighted logic £ with (imweighted) item space I on semiring (K, 0,0,0, 1), a heuristic 
function H : (i, K) K is any function satisfying the following. 

Admissibility For any provable instantiation of the goal item G : w, 

H{G : w) = zv 

Monotonicity For any provable instantiations Ai : Wi, A2 ■ 102, ■ ■ ■ , A,,, : zv,,, and an inference rule 

Ai:ZVi A2.ZV2 ... Am : ZVm 

B ■.g{zVi,ZV2,...,ZVm) 

and 1 <i <n, 

H{Ai : zvi) < H{B : g{zvi,zv2, . . . ,zv„,)) 
where < is the natural ordering of the semiring. 

With such an H, the A* algorithm on a deductive logic can then be described as in Figure |6] 

Similar to the original A* algorithm, the following property holds for the generalized A* algorithm as well: 

Theorem i.Ifa monotonic H is used, the generalized A* algorithm pops instantiations in increasing order of their H 
value. 



The proof of the tropical semiring case can be found in Felzenszwalb and McAllester | 2007| . We include the 



proof simply to show this is the case with any monotonic heuristic function and any semiring with the path 
property; not just the tropical semiring. 
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Inside 
i > 






i ^1 H '?2 H 




Outside 






Figure 7: Inside and outside weights on a shortest path 



Proof. Suppose some instantiation is not popped in order of the H value. Let the instantiations popped in order 
be Ai : w\,A-i : 102, ■■■ and let i be the smallest index such that H(A,_i : Wi_\) > H(A,- : iVi). Right before 
: Wi_i is popped. A, : if, cannot be inside Q, otherwise it will be popped instead. This means A,- : is 
added into Q after popping : by applying some inference rule with Aj_i : The application is 

of the form 



: w 



1-1 



Ai : g{...,Wi_i,...) 



Because H is monotonic. 



H(A,_i : w,_^) < H{Ai:g{...,w,^i,...)) = H(A,- : Wi) 
This contradicts the assumption H(A,_i : zfj-i) > H(A, : wi). 



□ 



If H is also admissible, then for any instantiation of a goal item G : w, H(G : lu) is just zv. Thus such 
instantiations are popped in increasing order of their weights and the first k such instantiations popped are the 
ones with the smallest weight. 



4.2 Outside weight as an exact heuristic 

For a given instantiation cj2'- we want the heuristic to tell us the weight of the shortest accepting path 
continuing from this instantiation. Such a heuristic is trivially monotonic and admissible. Let n be the path 
induced by qi-^ qi '■ u, and define 

Hi{qi q2'-u) = @w[}i\ w w[v] 

jl,V 

where the sum is over all pairs of prefixes and suffixes of transitions such that jinv forms an accepting path. 
When the semiring is commutative, the heuristic has a simple form. Define f^{qi ^ ^2) = ®ii,v'^[f] ® 
then 

Hi{qi q2:u) = ^{qi ^ (^2) <8 w 

We call fi{qi ^ q2) the outside weight of qi ^ q2, because on the shortest accepting path going through 
qi ^ qii ^{qi ^ qi) is the weight of the partial path "outside" of qi q2, as illustrated in Figure [7| This 
can be easily computed by applying the Scan and Complete rules in reverse, starting from the goal after the 
inside weights have been computed. See Figure |8] for a simple algorithm. Very similar to the inside algorithm 
in Figure |5j we use agenda-based reasoning, but with the goal item as the starting point (lines 5-6). Then lines 
7-20 try to propagate the estimates to inner items by applying inference rules in reverse. 
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function Outside 








OL < — thp insidp wpii^Vits frotn Tfifiidp 








^ <r- empty map 








Q empty queue 






5: 










Pushis f O) 








while Q is not empty do 








o-i On -( — Pov( O") 








u <r- ^[qi ^ q2\ 






10: 


for all incoming e of q2 do 








if /[e] e EU {e} then 




> Scan in reverse 




Relax(q-\ p\e],u i^wle]) 

rLJ' ^ LJ/ 








else if i[e] E Tl then 




i> Complete in reverse 




for all e' such that i[e'] = i[e] and (ji p[e' 


and n [e'] ^ 


* p [e] both in a do 


15: 


Relax {qi ^ p[e'],u ® w[e'] ® cc[n[e'] p 


ell ig) If H) 

-J J ^ L J / 






Relax{n[e'] "-^ p[s], u (8> if [e'] (S> p[ 


e'll (8)if[el) 

J J ^ L J / 






end for 








end if 








end for 






20: 


end while 








pud fiiTirtifiTi 








function Relax{qi ~> q2, 








if qi is in /S then 






25: 


M /5 [(Jl (^2] ® 








if u 7^ i6[(?i ^ (^2] then 








j6[(^i (^2] ^ w 








Push{qi ^ q2, Q) if (Ji ^ q2 not already in Q 








end if 






30: 


else 








^[qi (j2] ^ w 








Push{qi q2, Q) if qi ^2 not already in Q 








end if 








end function 







Figure 8: A simple Outside algorithm 
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Figure 9: j{qi ^ (^2) = D{q2,q3) ffi D{q2,q4) is the shortest distance from q2 to an "exit" 




Figure 10: The reversed WPDA of Figure |9] 
4.3 An inexact heuristic and its problems 

The above heuristic is very effective in the search because the outside weight gives an exact estimate. However, 
pre-computation of the outside weight requires two passes traversing the automaton. A natural question is 
whether there is an inexact heuristic, yet still monotonic and admissible, which takes less time to compute. 

When the multiplication does not decrease the weightj^ one may use the weight of only part of the final 
shortest accepting path as an estimate. This can produce a heuristic that is less expensive to compute, possibly 
at the cost of increasing the search time. In particular, define D{qi, q2) to be the shortest distance between any 
pair of states q\ and q2, and ^(qi q2) = D{q2,q3), where the summation is over all states reachable from 
q2 that have a closing parenthesis or simply / when qi is s (call such a state an exiting state, in contrast with an 
entering state), yiqi q2) is roughly how far away qi q2 is to a pair of immediate enclosing parenthesis]^ 
For example, in Figure |9| j{qi q2) = D{q2,q3) ffi D{q2,q4:) is the shortest distance from q2 to exitting states 
q3 and q^^. All the relevant values of D are in fact the inside weights of the reversed WPDA of M (for example, 
see Figure 10 1 ^therefore we call it the reverse inside weight. 



Then, we can define the following heuristic, 

H2{qi q2 ■■ u) = u ® ^{qi ^ q2) 



*For example, the tropical semiring with only non-negative weights in the setting of the classical shortest path problem on a graph, 
where real valued weights are summed within the path and the minimum is taken (i.e. use + as ® and min as ffi). 

^This is only a rough estimate since there may not be a opening parenthesis going to (ji that matches the closing parenthesis of the 
selected exitting state. However, the actual shortest distance is never smaller than this, which means the estimate is still admissible. 

^That is, reverse the direction of transitions; swap s and /; and swap n and EI. 
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Figure 11: A problematic WPDA for H2, weights are in the tropical semiring 



Item 




7 


s s 


3 


3 


s-^ qe 


1 


1 


s / 








qi ^ qi 


3 





qi ^ qi 


2 


1 


qi q3 


4 





qi q4 


1 





qi qs 


4 





qi q? 


4 





qi ^ qs 


4 






Table 1 



This gives us the weight of the shortest path starting at the induced path oi qi q2 ■ u to any exiting state, 
which may be a part of an accepting path. It is trivially admissible because 7(s ^ /) = D{f,f) = 1. When 
multiplication does not decrease the weight, the weight of part of a path is always smaller than or equal to the 
weight of the whole path. Therefore, the heuristic is monotonic. Unlike the outside weight, the semiring does 
not need to be commutative for this heuristic to be well-defined. 

Unfortunately, though we now spend less time in pre-computing the heuristic, the actual A* search usually 
ends up taking much longer because of the inexactness. To see why, notice that any instantiation of an item 
with a weight smaller than the shortest accepting path has to be visited, even if it will only be used in an 
accepting path far longer than the k shortest ones. For example, consider the WPDA in Figure 11 with the 
relevant values of j6 and 7 listed in Table [T] When using Hi, the following are instantiated before reaching the 
1-shortest path: 



s ^ s : 


(Hi 


= 0- 


h3 


= 3,Q = 


{qi ^ qi ■■ 0}) 


qi^ qi-0 


(Hi 


= 0- 


h3 


= 3,Q = 


{qi ^ qi-'i^qi"^ q3- o}) 


qi^ qi-'^ 


(Hi 


= 1 - 


h2 


= 3,Q = 


{qi qi-Zqi^ q?,- 0}) 


qi-^ q^-.l 


(Hi 


= 2- 


hi 


= 3,Q = 


{s-^ (j6 : 2,qi ^ qs : 0}) 


s ^ (j6 : 2 


(Hi 


= 2- 


hi 


= 3,Q = 


{s-^ f: 3,qi ^ t/3 : 0}) 


s->/: 3 


(Hi 


= 3- 


hO 


= 3,Q = 


{qi qs : 0}) 
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Q empty min-priority queue 
Push{{l,l),Q) with priority A 1 + Bi 
while Q is not empty do 
^ PopiQ) 
5: Output {Ai, Bj) 

if (f + 1,7) not already in Q then 

Push{{i + 1,7), Q) with priority + By 
end if 

if (f,; ' + 1) not already in Q then 
10: Push{{i,i + 1), Q) with priority A,- + By+i 

end if 
end while 



Figure 12 



However, when H2 is used, the following are instantiated before the 1 -shortest. 








(H2 


= 


+ 


= 0,Q = 




0, (ji (j2 : 1/S ^ 3 : 0}) 


qi-^ q3: 





(H2 


= 


+ 


= 0,Q = 


^ qs 


0, ^ (j2 : 1/S ^ s : 0}) 


<?! ^ : 





(H2 


= 


+ 


= 0,Q = 


{qi 


0, (ji (j2 : 1/S ^ 3 : 0}) 


■ 





(H2 


= 


+ 


= 0,Q = 


{qi 


0,qi q2 : l,s s : 0}) 


qi^ qs- 





(H2 


= 


+ 


= 0,Q = 


qi 


1,3 3 : 0,3 ^ / : 4}) 


qi^ qi- 


1 


(H2 


= 1 


+ 1 


= 2,Q = 


{qi q4 


2,3 3 : 0,3 ^ / : 4}) 


<?! ^ ^74 : 


2 


(H2 


= 2 


+ 


= 2,Q = 


{3 3 : 0,3 ^ / : 4}) 


3-^3:0 




(H2 


= 


+ 3 


= 3,Q = 


{s-^ q(,: 


0,3-^/: 4}) 


s-^ qe-"^ 


> 


(H2 


= 2 


+ 1 


= 3,Q = 


{3^/:3,3-./:4}) 


s^f:3 




(H2 


= 3 


+ 


= 3,Q = 


{3-/:4}) 



H2 ends up visiting more instantiations along the path from qi to q^ because that path is shorter in the scope 
of the enclosing parentheses. The following closing parenthesis completely flips the position, but this is some 
information Hi "knows" while H2 does not. In practice, we find this happens so frequently that H2 fails to 
output the shortest path within a reasonable amoimt of time. 

Another problem with H2 is that when multiplication may increase the weight (for example, in the trop- 
ical semiring with negative weights, which is commonly used in applications such machine translation), the 
heuristic is no longer monotonic. 



5 Algorithm 2 

We can adopt a new search strategy to address the problems with H2. Before describing the algorithm, we take 
a brief excursion to introduce a technique from|Huang and Chiang | 2005| . Consider the following problem: 



Let A and B be two (possibly infinite) ordered sequence of real numbers (i.e. for any A; < 
and Bj < Bj^i). Find the k smallest elements in A x B, ordered by the sum of the pair. 

For example, when A is {0,2,2} and B is {1,2,4}, the 3 smallest elements are {(0,1), (0,2), (2,1)}. A naive 
solution is to compute the first k elements in both A and B then sort all the k^ combinations. The technique 
from Huang and Chiang] | |2005 |, described in Figure [l2j visits at most 2k combinations and usually a lot fewer 



in practice. The key insight is that there is no need to explore (A/+i, By) or (A,, By+i) before (A,, Bj) is popped 
because both of them are guaranteed to be sub-optimal compared with (A,, Bj). When computing elements in 
A and B is expensive, this technique is substantially faster than the naive solution. 
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Pair of states 


D 


sj 


3 


Hi, Hi 


2 




1 












13, 18 





15, 18 





17, 18 





18, 18 






Table 2 



The same idea can be applied in our problem. For any pair of entering and exiting states {p,l), let Gpq be 
the sequence of balanced paths from p to q ordered by their weight and let G^^ be the f-th path. Following 
similar reasoning, we know there is no need to compute the actual value of G^pq^ before G^^ is ever used as part 
of a larger path, in search of the k shortest accepting path. Furthermore, Gpq can be incrementally computed. 



as we show next in Figure 13 



The algorithm operates as follows. First of all, instead of having a single priority queue, now for every 
relevant Gpq, we have a corresponding priority queue Qpq. Qpq is only responsible for finding the intermediate 
"goal", i.e. balanced paths from p to q, in increasing order of their weights. Further, only items of the form 
p r are pushed into Qpq, which allows us to use the following heuristic that only requires the reverse inside 
weights, 

Hpqip -^r -.u) = u® D{r,q) 

Items are then proved in a top-down fashion, starting with Gg^. The search process can be described recursively 
(Figure [13|. Let the sequence in consideration be Gpq, 

• If there is no balanced path from p to q using any parenthesis, all proofs only involve the Scan rule. As a 
result, Gpq can be incrementally computed without consulting any other sequence (lines 23-27). 

• Otherwise, let e and e' be the pair of parentheses encountered during the search. Simply query G„^i,jp^i,ij 
to get the shortest path (line 30), and only use the {k + l)-th shortest path after an instantiation proved 
with the fc-th one is popped (lines 14-18). 



Though omitted in Figure 13 for a simpler presentation, a further optimization is essential to achieve the 
desired efficiency. Observe in the second case above, that the exact knowledge of the shortest path from n[e] 
to p[e'] is not required until an item proved using that path is popped. Therefore, instead of directly calling 
FindKth{n[e], p[e'], 1), one can query D(M[e], p[e']) to get the shortest distance. This is sufficient to compute the 
priority and "promise" an actual proof, which will be realized once the item is popped. To distinguish actual 
instantiations from those with a promise, we denote qi q2 ■ u as an instantiation where the last step is based 
on a promise. 

To see the new algorithm at work, consider again the WPDA in Figure [Tl] Relevant values of D are listed 
in Table |2] Then the following are instantiated before reaching the 1 -shortest path. 



s s 
s 





%-2 



Gsf 

{H,f = + 3 = 3,Q,f = {s ~ : 2,s < 
(% = 2 + l = 3,Q,^ = {s~/:4}) 



/:4}) 



s-^ f -.3 



(% = 2 + 1 = 3, Q,^ = {s / : 3, s ~ / : 4}) 
(% = 3 + = 3,Q,/ = {s~/:4}) 



<?! ^1 

cji qi 

Cjl Cji 



iH,,a, =0 + 2 = 2, Qq,q, = {q 



J-qi qi 

qi qi 



q2 



= 1 + 1 = 2,Q,,,, = {<?i-^i?4 
= 2 + = 2,Q,,,, = {}) 



1}) 
2}) 



13 





function FindKth{p, q, k) > Finds the fc-th element of Gpq 




if the result has been cached then 




return the cached result 




end if 


5: 


S is a global variable storing proven items, initialized as empty outside the function 




0„n is a min-prioritv queue, initialized as empty outside the function 




it p p not in S then > First time called 




Pushup p : 1, Qpq) with priority D{p/ cj) 




end if 


10: 


while Qpcj is not empty do 




if top of Qpq is proven via Scan then 




V T ' U i — Pov( On/,) 




else > via Complete; further pushing is needed 




(v T ' u V e i\ ■< — Povl Onn\ 


15: 


n[e] ^ pl^'] '■ w ^ — FindKth(^n[e], p[e'], j -\- 1) 




h <^ V 1^ w[e] 1^ w 1^ If [e'] cj) 




if /i ^ then 




Push[{p ^ r : V (El io[e] (El 10 ^o[e'], v, e, e', y + 1), Qpq) with priority h > Store 




information for further pushing in the future 




end if 


20: 


end if 




Add p ^ T : u to S 




for all transition e such that p [c] = r do 




if i[e] e E U {e} then > Scan 




h i — u §5 [e] ® D(fi[e], cj) 


25: 


ifh y^O then 




Push{p ^ n[e] : u ^ w[e], Qpq) with priority h 




end if 




else if i[e\ G IT then > Complete; as the left antecedent 




for all transition such that z'fp'l = zfp] and Dfjifpl ^ do 


30: 


yl\p\ wfp'l ■ V < — FiiidKfh ( il\p] T}\p^] 1^ 








ifh^O then 




Pms/i((p : M (S> If [e] (8) Z7 (S> ^[6'], M, c, c', 1), Qpij) with priority /i > 




Store information for further pushing in the future 




end if 


35: 


end for 




end if 




end for 




if p r : If is a goal item then 




Cache p-^r -.w, then return p '^r : w 


40: 


end if 




end while 




end function 



Figure 13: Algorithm 2 



14 




• — • Expansion 

10^ L — . — — ' — , — ' — , — ' — I 

10^ lO" 10= 10** lO' 

IQI+lil 

Figure 14: Timing on WPDAs with various sizes, k = 1000 



Notice no item is ever instantiated from G^j^g, which is exactly the desired result. 

Another benefit of grouping the search by the intermediate "goals" is there is not any special requirement 
on the semiring — multiplication neither has to be commutative nor non-decreasing. 



6 Experimental Results 



We tested our algorithms on WPDAs generated from the machine translation system described in Iglesias et al. 



[|2011J. Figure 14 compares the running time of the two algorithms with two previous approaches (expansion 
and pruned-expansion with orcale threshold) in finding the 1000 shortest paths on sample WPDAs with various 
sizes. Due to the exponential time complexity of WPDA expansion, the expansion baseline is only able to finish 
within our time and memory limit on the 5 smallest sample inputs]^ For the pruned-expansion approach, we 
pick the oracle threshold (the exact weight difference between the shortest path and the 1000th shortest one) 
for each sample. 

Both of our algorithms are significantly faster than the expansion baseline, and their performance is compa- 
rable on smaller input. But as the size of the WPDA grows, the advantage of the single pass pre-computation 
of Algorithm 2 becomes clear, resulting in a very large time improvement in this case. 

The performance of Algorithm 2 is close to the pruned-expansion's oracle best case in almost all sample 
inputs. However, it is worth noting that the perfect threshold varies significantly between samples — even for 
those generated from the same system using different inputs, the factor of the perfect threshold relative to the 
weight of the shortest path can vary from 0.35% to 160% while the median is 7%. This justifies our previous 
claim about the difficulty in picking an appropriate threshold. 



Figure 15 breaks down the running time of our algorithms on a large WPDA. Both of them spend most of 
their time on pre-computing the heuristics and the actual search takes very little time even with k as large as 
10000. 



'2 CPU hours; 4 GB of memory. 
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• — • Algorithm 1 with H-^ 
• — • Algorithm 2 

— Inside + Outside 

— Reverse Inside 

— - Shortest Path 











10^ 



10" 
k 



10" 



Figure 15: Timing on a WPDA with 398347 states and 951889 transitions 



7 Conclusion 

In this paper, we developed two algorithms for finding k shortest paths of a WPDA. Previously, there were 
two approaches to this problem. The expansion approach expands the WPDA into an equivalent WFSA, 
which requires exponential time and space, and then finds the k shortest paths of the WFSA. Another pruned- 
expansion approach expands the WPDA into a WFSA with states or transitions not on a path close enough by 
a given threshold to the shortest path by weight removed, and then finds the k shortest paths of the pruned 
WFSA. This requires less time and space, but an appropriate threshold is almost impossible to set. 

In contrast, our algorithms do not need any pruning or threshold picking and give the exact k shortest 
paths. The experimental results on real world input show that Algorithm 2 is highly efficient, adding very little 
overhead to the shortest distance pre-computation, whose rimning time is comparable to the original shortest 



path algorithm in AUauzen and Riley 1 20121. 
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A Proof of Properties of Cm 

Theorem (Soundness). Any valid proof of an instantiation qi ^ q2 : u in Cm induces a balanced path from qi to q2 
with weight u in M. 

Proof. Base An axiom of the form qi q2 ■ u must have qi = q2 and u — \. The yield of a proof using only 
the axiom is an empty path, thus a balanced path with weight 1. 

Induction Assuming proofs with at most n steps satisfy the above lemma. Eor any proof oi qi q2 '■ u \n 
n + 1 steps, 

• If the last step uses the Scan rule, then it must be of the following form, 

11^13 ■ "1 a 

qi-^q2:u ^3 ^ ^2 : "2 

where ui ® U2 = u, q^ q2 : U2 ^ E and qi ^ qs ■ mi is the outcome of some proof in at most n 
steps. Let the induced path of the proof of qi qs '■ mi be n' = eie2 ■ ■ .Cm- The induced path of 
the whole proof is thus tt = 6162 ■ ■ ■ em('?3 qi)- By the induction hypothesis, n' is a balanced path 
from qi to q^ with weight mj. As a result, n is also balanced because a G 2L U {e} by definition of the 
logic; its weight is w[n\ = w[t['] ® U2 = ui ® U2 = u. 

• If the last step uses the Complete rule, then it must be of the following form. 
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qi-^ q2:u ^3 -> ^4 : "2/ qs qi- "4 

where ui U2 ® u^® = u, a e n is an opening parenthesis, and a e fl is the corresponding 
closing parenthesis. Similar to the Scan rule case, one can prove the induced path is a balanced path 
from qi to q2 with weight u using the associativity of (g). 

□ 

Theorem (Completeness). Any balanced path from an entering state qi to some state q2 with weight u in M has a valid 
proof of an instantiation ^2 • " whose induced path is that path. 

Proof. Base For any empty balanced path from a state q such that ^ ^ : T is an axiom, the proof is just the 
axiom itself. 

Induction Assuming all balanced paths from any entering state of at most length n satisfy the above lemma. 
For any balanced path of length n + 1 from an entering state n — e\e2 . . . e„+i from qi to q2 with weight 

M, 

• If e„+i is q3 q2 '■ "2 with « e Z, U {e}, then n' = e\e2...em is a balanced of length n and 
w[tx'] (g) M2 = w. By induction hypothesis, there exists a proof of tt' (via item q\ n[em\ : iy[7r']). 
Applying the Scan riile then gives a proof of qi q2 : ui iS> U2 — u. 

• If is (j3 (j2 : "2 with a G n, then there must be a fc < n such that e^ balances with Similar 
to the above case, one can prove the item by combining the proof of ei . . . e^t-i arid e^+i ... em- 

• If e„+i is ^3 A ^2 : M2 with a G 11, the path cannot be balanced. 

□ 

Theorem (In-ambigmty). Any balanced path from an entering state in M has a unique proof in Cj^. 

Proof. This is very similar to the completeness proof, thus we only give a sketch of the proof. First note all 
empty paths from an entering state has a unique proof (if the start state happens to have an incoming open- 
parenthesis fransition, we consider the two inducing the same axiom). For any longer paths, if the last fransition 
has a label from Z, U {e} then the last step must be using the Scan rule with antecedents with unique proof and 
that particular transition as the side condition; otherwise the last transition must have a closing parenthesis, 
which means a unique application of the Complete rule. □ 
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